1 Data

Load the student scores for the test - here we load the ETH Zurich test data, downloaded from https://pontifex.ethz.ch/s21t5/rate/

Show data summary
skim(test_scores)
Data summary
Name test_scores
Number of rows 9671
Number of columns 44
_______________________
Column type frequency:
character 3
numeric 41
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
test_version 0 1 3 4 0 2 0
year 0 1 4 4 0 5 0
class 0 1 13 13 0 5 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
z1_Z1 0 1.00 0.97 0.21 0 1 1 1 2 ▁▁▇▁▁
z2_Z0 6238 0.35 0.43 0.62 0 0 0 1 2 ▇▁▃▁▁
z3_Z2 0 1.00 0.70 0.51 0 0 1 1 2 ▃▁▇▁▁
z4_Z3 0 1.00 0.66 0.51 0 0 1 1 2 ▅▁▇▁▁
z5_Z4 0 1.00 1.03 0.72 0 1 1 2 2 ▃▁▇▁▅
z6_Z5 0 1.00 1.06 0.51 0 1 1 1 2 ▁▁▇▁▂
z7_Z6 0 1.00 0.72 0.47 0 0 1 1 2 ▃▁▇▁▁
z8_Z7 0 1.00 0.80 0.65 0 0 1 1 2 ▅▁▇▁▂
z9_Z8 0 1.00 1.08 0.74 0 1 1 2 2 ▅▁▇▁▆
z10_Z9 0 1.00 1.01 0.68 0 1 1 1 2 ▃▁▇▁▃
z11_Z10 0 1.00 0.93 0.64 0 1 1 1 2 ▃▁▇▁▂
z12_Z0 6238 0.35 0.90 0.54 0 1 1 1 2 ▂▁▇▁▁
z13_Z0 6238 0.35 0.60 0.69 0 0 0 1 2 ▇▁▆▁▂
z14_Z12 0 1.00 0.81 0.64 0 0 1 1 2 ▅▁▇▁▂
z15_Z13 0 1.00 0.74 0.69 0 0 1 1 2 ▇▁▇▁▂
z16_Z14 0 1.00 0.61 0.80 0 0 0 1 2 ▇▁▂▁▃
z17_Z15 0 1.00 0.73 0.58 0 0 1 1 2 ▅▁▇▁▁
z18_Z16 0 1.00 0.78 0.80 0 0 1 1 2 ▇▁▆▁▅
z19_Z0 6238 0.35 0.95 0.81 0 0 1 2 2 ▇▁▇▁▇
z20_Z17 0 1.00 0.83 0.50 0 1 1 1 2 ▂▁▇▁▁
z21_Z18 0 1.00 0.92 0.69 0 0 1 1 2 ▅▁▇▁▃
z22_Z19 0 1.00 1.00 0.70 0 1 1 1 2 ▃▁▇▁▃
z23_Z20 0 1.00 0.90 0.75 0 0 1 1 2 ▆▁▇▁▅
z24_Z21 0 1.00 0.89 0.64 0 0 1 1 2 ▃▁▇▁▂
z25_Z0 6238 0.35 0.72 0.53 0 0 1 1 2 ▅▁▇▁▁
z26_Z22 0 1.00 1.07 0.44 0 1 1 1 2 ▁▁▇▁▂
z27_Z23 0 1.00 1.02 0.77 0 0 1 2 2 ▆▁▇▁▆
z28_Z24 0 1.00 1.08 0.75 0 1 1 2 2 ▅▁▇▁▆
z29_Z25 0 1.00 0.88 0.70 0 0 1 1 2 ▅▁▇▁▃
z30_Z0 6238 0.35 0.90 0.50 0 1 1 1 2 ▂▁▇▁▁
z31_Z27 0 1.00 0.63 0.69 0 0 1 1 2 ▇▁▆▁▂
z32_Z28 0 1.00 0.70 0.84 0 0 0 1 2 ▇▁▃▁▃
z33_Z29 0 1.00 0.83 0.46 0 1 1 1 2 ▂▁▇▁▁
z34_Z0 6238 0.35 1.14 0.62 0 1 1 2 2 ▂▁▇▁▃
z35_Z0 6238 0.35 1.31 0.76 0 1 1 2 2 ▃▁▅▁▇
z36_Z0 6238 0.35 1.14 0.87 0 0 1 2 2 ▆▁▃▁▇
z0_Z11 3433 0.65 0.70 0.57 0 0 1 1 2 ▅▁▇▁▁
z0_Z26 3433 0.65 0.80 0.65 0 0 1 1 2 ▅▁▇▁▂
z0_Z30 3433 0.65 1.04 0.63 0 1 1 1 2 ▂▁▇▁▃
z0_Z31 3433 0.65 0.96 0.79 0 0 1 2 2 ▇▁▇▁▆
z0_Z32 3433 0.65 1.37 0.81 0 1 2 2 2 ▃▁▃▁▇

The scores are:

  • 0 if the student gave an incorrect response
  • 1 if the student gave a correct response
  • 2 if the student chose the “I don’t know” answer option
  • NA if there was no response recorded

For this analysis, we replace the “2 = I don’t know” scores with 0, reflecting a non-correct answer.

test_scores <- test_scores %>% 
  mutate(across(starts_with("z"), ~ ifelse(. == 2, 0, .)))

1.1 Data summary

The number of responses from each cohort:

test_scores %>% 
  group_by(year) %>% 
  tally() %>% 
  gt() %>% 
  data_color(
    columns = c("n"),
    colors = scales::col_numeric(palette = c("Blues"), domain = NULL)
  )
year n
2017 1682
2018 1751
2019 1999
2020 2191
2021 2048

There is the same (roughly normal) distribution of raw scores each year:

total_scores <- test_scores %>% mutate(total = rowSums(across(where(is.numeric)), na.rm = TRUE))

total_scores %>%
  ggplot(aes(x = total)) +
  geom_histogram(binwidth = 1) +
  facet_wrap(vars(year)) +
  theme_minimal()

Mean and standard deviation for each item (note this table is very wide - see the scroll bar at the bottom!):

test_scores %>% 
  select(-class, -test_version) %>% 
  group_by(year) %>% 
  skim_without_charts() %>% 
  select(-contains("character."), -contains("numeric.p"), -skim_type) %>% 
  rename(complete = complete_rate) %>% 
  # make the table wider, i.e. with separate columns for each year's results, with the year at the start of each column name
  pivot_wider(names_from = year, values_from = -c(skim_variable, year), names_glue = "{year}__{.value}") %>% 
  # put the columns in order by year
  select(sort(names(.))) %>% 
  select(skim_variable, everything()) %>% 
  # use GT to make the table look nice
  gt(rowname_col = "skim_variable") %>% 
  # group the columns from each year
  tab_spanner_delim(delim = "__") %>%
  fmt_number(columns = contains("numeric"), decimals = 2) %>%
  fmt_percent(columns = contains("complete"), decimals = 0) %>% 
  # change all the numeric.mean and numeric.sd column names to Mean and SD
  cols_label(
    .list = test_scores %>% select(year) %>% distinct() %>% transmute(col = paste0(year, "__numeric.mean"), label = "Mean") %>% deframe()
  ) %>% 
  cols_label(
    .list = test_scores %>% select(year) %>% distinct() %>% transmute(col = paste0(year, "__numeric.sd"), label = "SD") %>% deframe()
  ) %>%
  data_color(
    columns = contains("numeric.mean"),
    colors = scales::col_numeric(palette = c("Greens"), domain = NULL)
  )
2017 2018 2019 2020 2021
complete n_missing Mean SD complete n_missing Mean SD complete n_missing Mean SD complete n_missing Mean SD complete n_missing Mean SD
z1_Z1 100% 0 0.95 0.22 100% 0 0.95 0.21 100% 0 0.96 0.21 100% 0 0.96 0.20 100% 0 0.95 0.21
z2_Z0 100% 0 0.31 0.46 100% 0 0.28 0.45 0% 1999 NaN NA 0% 2191 NaN NA 0% 2048 NaN NA
z3_Z2 100% 0 0.62 0.48 100% 0 0.64 0.48 100% 0 0.67 0.47 100% 0 0.66 0.48 100% 0 0.67 0.47
z4_Z3 100% 0 0.64 0.48 100% 0 0.62 0.48 100% 0 0.64 0.48 100% 0 0.62 0.48 100% 0 0.60 0.49
z5_Z4 100% 0 0.47 0.50 100% 0 0.49 0.50 100% 0 0.49 0.50 100% 0 0.49 0.50 100% 0 0.49 0.50
z6_Z5 100% 0 0.71 0.46 100% 0 0.73 0.44 100% 0 0.74 0.44 100% 0 0.76 0.42 100% 0 0.74 0.44
z7_Z6 100% 0 0.71 0.46 100% 0 0.70 0.46 100% 0 0.69 0.46 100% 0 0.70 0.46 100% 0 0.68 0.47
z8_Z7 100% 0 0.47 0.50 100% 0 0.50 0.50 100% 0 0.52 0.50 100% 0 0.60 0.49 100% 0 0.58 0.49
z9_Z8 100% 0 0.46 0.50 100% 0 0.46 0.50 100% 0 0.46 0.50 100% 0 0.43 0.50 100% 0 0.42 0.49
z10_Z9 100% 0 0.54 0.50 100% 0 0.54 0.50 100% 0 0.55 0.50 100% 0 0.55 0.50 100% 0 0.53 0.50
z11_Z10 100% 0 0.57 0.49 100% 0 0.58 0.49 100% 0 0.58 0.49 100% 0 0.59 0.49 100% 0 0.57 0.50
z12_Z0 100% 0 0.68 0.47 100% 0 0.71 0.45 0% 1999 NaN NA 0% 2191 NaN NA 0% 2048 NaN NA
z13_Z0 100% 0 0.37 0.48 100% 0 0.36 0.48 0% 1999 NaN NA 0% 2191 NaN NA 0% 2048 NaN NA
z14_Z12 100% 0 0.56 0.50 100% 0 0.57 0.49 100% 0 0.59 0.49 100% 0 0.54 0.50 100% 0 0.54 0.50
z15_Z13 100% 0 0.46 0.50 100% 0 0.46 0.50 100% 0 0.47 0.50 100% 0 0.46 0.50 100% 0 0.45 0.50
z16_Z14 100% 0 0.19 0.39 100% 0 0.20 0.40 100% 0 0.20 0.40 100% 0 0.20 0.40 100% 0 0.20 0.40
z17_Z15 100% 0 0.59 0.49 100% 0 0.60 0.49 100% 0 0.60 0.49 100% 0 0.59 0.49 100% 0 0.57 0.49
z18_Z16 100% 0 0.38 0.49 100% 0 0.36 0.48 100% 0 0.28 0.45 100% 0 0.27 0.45 100% 0 0.28 0.45
z19_Z0 100% 0 0.35 0.48 100% 0 0.34 0.47 0% 1999 NaN NA 0% 2191 NaN NA 0% 2048 NaN NA
z20_Z17 100% 0 0.72 0.45 100% 0 0.72 0.45 100% 0 0.74 0.44 100% 0 0.71 0.46 100% 0 0.71 0.45
z21_Z18 100% 0 0.51 0.50 100% 0 0.52 0.50 100% 0 0.54 0.50 100% 0 0.52 0.50 100% 0 0.51 0.50
z22_Z19 100% 0 0.52 0.50 100% 0 0.52 0.50 100% 0 0.52 0.50 100% 0 0.52 0.50 100% 0 0.50 0.50
z23_Z20 100% 0 0.41 0.49 100% 0 0.43 0.50 100% 0 0.42 0.49 100% 0 0.42 0.49 100% 0 0.42 0.49
z24_Z21 100% 0 0.59 0.49 100% 0 0.58 0.49 100% 0 0.60 0.49 100% 0 0.56 0.50 100% 0 0.57 0.50
z25_Z0 100% 0 0.54 0.50 100% 0 0.73 0.44 0% 1999 NaN NA 0% 2191 NaN NA 0% 2048 NaN NA
z26_Z22 100% 0 0.80 0.40 100% 0 0.80 0.40 100% 0 0.80 0.40 100% 0 0.81 0.39 100% 0 0.80 0.40
z27_Z23 100% 0 0.39 0.49 100% 0 0.40 0.49 100% 0 0.42 0.49 100% 0 0.40 0.49 100% 0 0.41 0.49
z28_Z24 100% 0 0.42 0.49 100% 0 0.46 0.50 100% 0 0.45 0.50 100% 0 0.42 0.49 100% 0 0.41 0.49
z29_Z25 100% 0 0.48 0.50 100% 0 0.52 0.50 100% 0 0.51 0.50 100% 0 0.49 0.50 100% 0 0.50 0.50
z30_Z0 100% 0 0.76 0.43 100% 0 0.73 0.44 0% 1999 NaN NA 0% 2191 NaN NA 0% 2048 NaN NA
z31_Z27 100% 0 0.38 0.48 100% 0 0.37 0.48 100% 0 0.39 0.49 100% 0 0.40 0.49 100% 0 0.40 0.49
z32_Z28 100% 0 0.23 0.42 100% 0 0.20 0.40 100% 0 0.20 0.40 100% 0 0.21 0.41 100% 0 0.20 0.40
z33_Z29 100% 0 0.78 0.42 100% 0 0.76 0.43 100% 0 0.77 0.42 100% 0 0.74 0.44 100% 0 0.75 0.43
z34_Z0 100% 0 0.58 0.49 100% 0 0.61 0.49 0% 1999 NaN NA 0% 2191 NaN NA 0% 2048 NaN NA
z35_Z0 100% 0 0.32 0.47 100% 0 0.33 0.47 0% 1999 NaN NA 0% 2191 NaN NA 0% 2048 NaN NA
z36_Z0 100% 0 0.22 0.41 100% 0 0.23 0.42 0% 1999 NaN NA 0% 2191 NaN NA 0% 2048 NaN NA
z0_Z11 0% 1682 NaN NA 0% 1751 NaN NA 100% 0 0.57 0.50 100% 0 0.59 0.49 100% 0 0.58 0.49
z0_Z26 0% 1682 NaN NA 0% 1751 NaN NA 100% 0 0.54 0.50 100% 0 0.53 0.50 100% 0 0.52 0.50
z0_Z30 0% 1682 NaN NA 0% 1751 NaN NA 100% 0 0.61 0.49 100% 0 0.61 0.49 100% 0 0.60 0.49
z0_Z31 0% 1682 NaN NA 0% 1751 NaN NA 100% 0 0.35 0.48 100% 0 0.33 0.47 100% 0 0.45 0.50
z0_Z32 0% 1682 NaN NA 0% 1751 NaN NA 100% 0 0.20 0.40 100% 0 0.21 0.41 100% 0 0.21 0.41

2 Testing assumptions

Before applying IRT, we should check that the data satisfies the assumptions needed by the model. In particular, to use a 1-dimensional IRT model, we should have some evidence of unidimensionality in the test scores.

Since the combined data on the two versions of the test have large portions of “missing” data (e.g. no responses to new questions from students who completed the old test), it is not possible to carry out the factor analysis as in the analyse-test script, since the factor analysis routine does not work with missing data.

Instead, in the next section we proceed directly to fitting the IRT model, and using the \(Q_3\) check for local independence. In the final section, we also run a factor analysis for the data from the new test only.

3 Fitting 2 parameter logistic MIRT model

We use the mirt package to fit an item response theory model.

For this analysis, we use the 2 parameter logistic (2PL) model.

Show model fitting output
# save just the matrix of scores
item_scores <- test_scores %>% 
  select(matches("^z\\d"))

fit_2pl <- mirt(
  data = item_scores, # just the columns with question scores
  #removeEmptyRows = TRUE, 
  model = 1,          # number of factors to extract
  itemtype = "2PL",   # 2 parameter logistic model
  SE = TRUE           # estimate standard errors
  )
## 
Iteration: 1, Log-Lik: -186638.319, Max-Change: 4.61593
Iteration: 2, Log-Lik: -173222.852, Max-Change: 3.34257
Iteration: 3, Log-Lik: -172025.296, Max-Change: 0.75461
Iteration: 4, Log-Lik: -171559.893, Max-Change: 0.30299
Iteration: 5, Log-Lik: -171310.677, Max-Change: 0.29565
Iteration: 6, Log-Lik: -171163.155, Max-Change: 0.15660
Iteration: 7, Log-Lik: -171055.541, Max-Change: 0.15756
Iteration: 8, Log-Lik: -170980.492, Max-Change: 0.10202
Iteration: 9, Log-Lik: -170924.986, Max-Change: 0.13257
Iteration: 10, Log-Lik: -170881.349, Max-Change: 0.08812
Iteration: 11, Log-Lik: -170846.140, Max-Change: 0.07497
Iteration: 12, Log-Lik: -170823.782, Max-Change: 0.04927
Iteration: 13, Log-Lik: -170807.153, Max-Change: 0.06731
Iteration: 14, Log-Lik: -170793.659, Max-Change: 0.03988
Iteration: 15, Log-Lik: -170783.377, Max-Change: 0.04419
Iteration: 16, Log-Lik: -170775.383, Max-Change: 0.02498
Iteration: 17, Log-Lik: -170768.350, Max-Change: 0.03335
Iteration: 18, Log-Lik: -170762.835, Max-Change: 0.01562
Iteration: 19, Log-Lik: -170757.759, Max-Change: 0.02149
Iteration: 20, Log-Lik: -170753.836, Max-Change: 0.01406
Iteration: 21, Log-Lik: -170750.152, Max-Change: 0.01118
Iteration: 22, Log-Lik: -170744.133, Max-Change: 0.01296
Iteration: 23, Log-Lik: -170741.889, Max-Change: 0.01003
Iteration: 24, Log-Lik: -170740.015, Max-Change: 0.00919
Iteration: 25, Log-Lik: -170732.968, Max-Change: 0.00538
Iteration: 26, Log-Lik: -170732.507, Max-Change: 0.00457
Iteration: 27, Log-Lik: -170732.168, Max-Change: 0.00410
Iteration: 28, Log-Lik: -170730.815, Max-Change: 0.00252
Iteration: 29, Log-Lik: -170730.746, Max-Change: 0.00278
Iteration: 30, Log-Lik: -170730.686, Max-Change: 0.00187
Iteration: 31, Log-Lik: -170730.559, Max-Change: 0.00125
Iteration: 32, Log-Lik: -170730.538, Max-Change: 0.00104
Iteration: 33, Log-Lik: -170730.519, Max-Change: 0.00099
Iteration: 34, Log-Lik: -170730.435, Max-Change: 0.00068
Iteration: 35, Log-Lik: -170730.427, Max-Change: 0.00035
Iteration: 36, Log-Lik: -170730.424, Max-Change: 0.00018
Iteration: 37, Log-Lik: -170730.419, Max-Change: 0.00019
Iteration: 38, Log-Lik: -170730.417, Max-Change: 0.00018
Iteration: 39, Log-Lik: -170730.416, Max-Change: 0.00016
Iteration: 40, Log-Lik: -170730.408, Max-Change: 0.00013
Iteration: 41, Log-Lik: -170730.408, Max-Change: 0.00013
Iteration: 42, Log-Lik: -170730.407, Max-Change: 0.00013
Iteration: 43, Log-Lik: -170730.403, Max-Change: 0.00056
Iteration: 44, Log-Lik: -170730.401, Max-Change: 0.00046
Iteration: 45, Log-Lik: -170730.399, Max-Change: 0.00046
Iteration: 46, Log-Lik: -170730.398, Max-Change: 0.00029
Iteration: 47, Log-Lik: -170730.397, Max-Change: 0.00011
Iteration: 48, Log-Lik: -170730.397, Max-Change: 0.00005
## 
## Calculating information matrix...

3.1 Local independence

We compute Yen’s \(Q_3\) (1984) to check for any dependence between items after controlling for \(\theta\). This gives a score for each pair of items, with scores above 0.2 regarded as problematic (see DeMars, p. 48).

residuals_2pl  %>% as.matrix() %>% 
  corrplot::corrplot(type = "upper")

This shows that most item pairs are independent, with only a couple of pairs showing cause for concern:

residuals_2pl %>%
  rownames_to_column(var = "item1") %>%
  as_tibble() %>% 
  pivot_longer(cols = starts_with("z"), names_to = "item2", values_to = "Q3_score") %>% 
  filter(abs(Q3_score) > 0.2) %>% 
  filter(parse_number(item1) < parse_number(item2)) %>%
  gt()
item1 item2 Q3_score
z18_Z16 z19_Z0 0.6739582
z34_Z0 z35_Z0 0.2105944

In fact, both of these pairs highlight questions that were removed from the test:

  • z18 and z19 are on the product and quotient rules, and only z18 was retained on the new test,

  • z34 and z35 are both about vectors; only z34 was retained (in modified form, as Z30)

Given that this violation of the local independence assumption is very mild, we proceed using this model.

3.2 Model parameters

We then compute factor score estimates and augment the existing data frame with these estimates, to keep everything in one place. To do the estimation, we use the fscores() function from the mirt package which takes in a computed model object and computes factor score estimates according to the method specified. We will use the EAP method for factor score estimation, which is the “expected a-posteriori” method, the default.

test_scores <- test_scores %>%
  mutate(F1 = fscores(fit_2pl, method = "EAP"))

We can also calculate the model coefficient estimates using a generic function coef() which is used to extract model coefficients from objects returned by modeling functions. We will set the IRTpars argument to TRUE, which means slope intercept parameters will be converted into traditional IRT parameters.

coefs_2pl <- coef(fit_2pl, IRTpars = TRUE)

The resulting object coefs is a list, with one element for each question, and an additional GroupPars element that we won’t be using. For each question, the object records several values:

  • a is discrimination
  • b is difficulty
  • endpoints of the 95% confidence intervals are also shown

To make this output a little more user friendly, we use the tidy_mirt_coefs function that we have provided, to produce a single data frame with a row for each question.

source("fn_tidy_mirt_coefs.R")

tidy_2pl <- tidy_mirt_coefs(coefs_2pl) %>% 
  filter(par %in% c("a", "b")) %>%  
  left_join(test_versions %>% select(-item_num), by = c("Question" = "label"))

tidy_2pl_wide <- tidy_2pl %>% 
  pivot_wider(names_from = "par", values_from = c(est, CI_2.5, CI_97.5), names_glue = "{par}_{.value}")

Here is a nicely formatted table of the result:

tidy_2pl %>%
  select(-pre,-post,-notes) %>% 
  group_by(outcome) %>% 
  mutate(ci = str_glue("[{round(CI_2.5, 2)}, {round(CI_97.5, 2)}]")) %>% 
  select(-starts_with("CI_")) %>% 
  pivot_wider(names_from = "par", values_from = c(est, ci), names_glue = "{par}_{.value}") %>% 
  gt() %>% 
  fmt_number(columns = contains("_est"), decimals = 2) %>%
  data_color(
    columns = contains("a_est"),
    colors = scales::col_numeric(palette = c("Greens"), domain = NULL)
  ) %>%
  data_color(
    columns = contains("b_est"),
    colors = scales::col_numeric(palette = c("Blues"), domain = NULL)
  ) %>%
  tab_spanner(label = "Discrimination", columns = contains("a_")) %>%
  tab_spanner(label = "Difficulty", columns = contains("b_")) %>%
  cols_label(
    a_est = "Est.",
    b_est = "Est.",
    a_ci = "95% CI",
    b_ci = "95% CI"
  )
Question description Discrimination Difficulty
Est. 95% CI Est. 95% CI
unchanged
z1_Z1 linear function 1.26 [1.14, 1.38] −2.92 [-3.13, -2.71]
z3_Z2 arithmetic rules 1.29 [1.23, 1.36] −0.64 [-0.69, -0.6]
z4_Z3 Easy ineq. 1.15 [1.09, 1.21] −0.57 [-0.62, -0.52]
z5_Z4 logs 1.43 [1.35, 1.5] 0.05 [0.01, 0.09]
z6_Z5 logs 1.51 [1.43, 1.6] −0.97 [-1.02, -0.92]
z7_Z6 graph translation 1.37 [1.3, 1.44] −0.82 [-0.87, -0.77]
z8_Z7 sine pi/3 1.18 [1.11, 1.24] −0.18 [-0.22, -0.13]
z9_Z8 trig.ineq. 1.98 [1.89, 2.08] 0.17 [0.14, 0.2]
z10_Z9 trig.identity 1.74 [1.66, 1.83] −0.16 [-0.19, -0.12]
z11_Z10 period 1.61 [1.53, 1.69] −0.30 [-0.34, -0.26]
z14_Z12 limit 1.47 [1.4, 1.55] −0.23 [-0.27, -0.19]
z15_Z13 series 0.83 [0.78, 0.88] 0.21 [0.16, 0.27]
z16_Z14 diff.quotient 1.41 [1.32, 1.49] 1.34 [1.28, 1.41]
z17_Z15 graph f' 1.22 [1.15, 1.28] −0.39 [-0.43, -0.34]
z18_Z16 product rule 1.08 [1.02, 1.15] 0.91 [0.85, 0.97]
z20_Z17 (exp)' 2.18 [2.06, 2.29] −0.75 [-0.78, -0.71]
z21_Z18 (ln(sin))' 2.10 [2, 2.2] −0.07 [-0.1, -0.03]
z22_Z19 hyp.functions 1.99 [1.9, 2.09] −0.05 [-0.08, -0.01]
z23_Z20 slope tangent 1.94 [1.85, 2.03] 0.26 [0.23, 0.3]
z24_Z21 IVT 1.41 [1.34, 1.49] −0.31 [-0.35, -0.27]
z26_Z22 int(poly) 2.38 [2.25, 2.51] −1.06 [-1.1, -1.02]
z27_Z23 int(exp) 1.89 [1.8, 1.99] 0.32 [0.29, 0.36]
z28_Z24 Int = 0 1.85 [1.76, 1.94] 0.23 [0.2, 0.27]
z29_Z25 int even funct 0.83 [0.78, 0.88] 0.00 [-0.05, 0.06]
z31_Z27 int(abs) 1.29 [1.23, 1.36] 0.45 [0.4, 0.49]
z32_Z28 FtoC algebra 1.52 [1.44, 1.61] 1.22 [1.16, 1.28]
z33_Z29 difference vectors 0.99 [0.93, 1.06] −1.37 [-1.46, -1.29]
removed
z2_Z0 3D 0.55 [0.47, 0.64] 1.69 [1.43, 1.96]
z12_Z0 rational exponents 0.88 [0.78, 0.97] −1.11 [-1.24, -0.98]
z13_Z0 ellipsoid 0.71 [0.62, 0.79] 0.86 [0.73, 1]
z19_Z0 quotient rule 1.27 [1.16, 1.38] 0.65 [0.58, 0.73]
z25_Z0 velocity 0.53 [0.45, 0.61] −1.14 [-1.35, -0.94]
z30_Z0 FtoC graph 1.25 [1.13, 1.37] −1.10 [-1.2, -1]
z34_Z0 normal vector 0.89 [0.8, 0.99] −0.52 [-0.62, -0.43]
z35_Z0 intersect planes 1.26 [1.15, 1.37] 0.74 [0.66, 0.82]
z36_Z0 parallel planes 1.23 [1.11, 1.35] 1.30 [1.19, 1.41]
added
z0_Z11 rational exponents 0.77 [0.7, 0.84] −0.48 [-0.56, -0.4]
z0_Z26 FtoC graph 1.08 [1.01, 1.16] −0.15 [-0.2, -0.09]
z0_Z30 normal vector 0.81 [0.75, 0.88] −0.59 [-0.67, -0.51]
z0_Z31 vector product 1.01 [0.94, 1.09] 0.62 [0.55, 0.69]
z0_Z32 scalar product 1.32 [1.22, 1.41] 1.32 [1.24, 1.41]

These values are also saved to the file output/eth_post_2pl-results.csv.

tidy_2pl_wide %>% 
  write_csv("output/eth_post_2pl-results.csv")

This shows the difficulty and discrimination of each of the questions, highlighting those that were added or removed:

tidy_2pl_wide %>% 
  mutate(qnum = parse_number(Question)) %>% 
  ggplot(aes(
    x = a_est,
    y = b_est,
    label = case_when(
      outcome == "unchanged" ~ "",
      outcome == "removed" ~ pre,
      outcome == "added" ~ post
    ),
    colour = outcome
  )) +
  geom_errorbar(aes(ymin = b_CI_2.5, ymax = b_CI_97.5), width = 0, alpha = 0.5) +
  geom_errorbar(aes(xmin = a_CI_2.5, xmax = a_CI_97.5), width = 0, alpha = 0.5) +
  geom_text_repel() +
  geom_point() +
  theme_minimal() +
  labs(x = "Discrimination",
       y = "Difficulty")

3.3 Comparing years and classes

Do students from different programmes of study have different distributions of ability?

Compare the distribution of abilities in the year groups:

ggplot(test_scores, aes(F1, y = year, fill = as.factor(year), colour = as.factor(year))) +
  geom_density_ridges(alpha=0.5) + 
  scale_x_continuous(limits = c(-3.5,3.5)) +
  labs(title = "Density plot", 
       subtitle = "Ability grouped by year of taking the test", 
       x = "Ability", y = "Density",
       fill = "Year", colour = "Year") +
  theme_minimal()
## Picking joint bandwidth of 0.189

3.4 Information curves

theta <- seq(-6, 6, by=0.05)

info_matrix <- testinfo(fit_2pl, theta, individual = TRUE)
colnames(info_matrix) <- test_versions %>% pull(label)
item_info_data <- info_matrix %>% 
  as_tibble() %>% 
  bind_cols(theta = theta) %>% 
  pivot_longer(cols = -theta, names_to = "item", values_to = "info_y") %>% 
  mutate(item = fct_reorder(item, parse_number(item)))

plot(fit_2pl, type = "infoSE", main = "Test information")

3.4.1 Item information curves

The information curves for each question help to highlight those questions that are most/least informative:

item_info_data %>% 
  ggplot(aes(x = theta, y = info_y, colour = item)) +
  geom_line() +
  scale_colour_viridis_d(option = "plasma", end = 0.8, direction = -1) +
  facet_wrap(vars(item)) +
  labs(y = "Information") +
  theme_minimal()

3.5 Total information

Using mirt’s areainfo function, we can find the total area under the information curves.

info_irt <- areainfo(fit_2pl, c(-4,4))
info_irt %>% gt()
LowerBound UpperBound Info TotalInfo Proportion nitems
-4 4 52.88075 54.46439 0.9709234 41

This shows that the total information in all items is 54.4643857.

3.5.1 Information by item

tidy_info <- test_versions %>%
  mutate(item_num = row_number()) %>% 
  mutate(TotalInfo = purrr::map_dbl(
    item_num,
    ~ areainfo(fit_2pl,
               c(-4, 4),
               which.items = .x) %>% pull(TotalInfo)
  ))

tidy_info %>%
  select(-item_num) %>% 
  arrange(-TotalInfo) %>% 
  #group_by(outcome) %>% 
  gt() %>% 
  sub_missing(columns = one_of("notes", "pre", "post"), missing_text = "") %>% 
  fmt_number(columns = contains("a_"), decimals = 2) %>%
  fmt_number(columns = contains("b_"), decimals = 2) %>%
  data_color(
    columns = contains("info"),
    colors = scales::col_numeric(palette = c("Greens"), domain = NULL)
  ) %>%
  data_color(
    columns = contains("outcome"),
    colors = scales::col_factor(palette = c("viridis"), domain = NULL)
  ) %>%
  cols_label(
    TotalInfo = "Information"
  )
pre post description notes label outcome Information
z26 Z22 int(poly) z26_Z22 unchanged 2.3802120
z20 Z17 (exp)' z20_Z17 unchanged 2.1756893
z21 Z18 (ln(sin))' z21_Z18 unchanged 2.0980069
z22 Z19 hyp.functions z22_Z19 unchanged 1.9933525
z9 Z8 trig.ineq. z9_Z8 unchanged 1.9831140
z23 Z20 slope tangent z23_Z20 unchanged 1.9408823
z27 Z23 int(exp) z27_Z23 unchanged 1.8936276
z28 Z24 Int = 0 z28_Z24 unchanged 1.8527290
z10 Z9 trig.identity z10_Z9 unchanged 1.7433688
z11 Z10 period z11_Z10 unchanged 1.6077264
z33 Z29 difference vectors z33_Z29 unchanged 1.5215297
z6 Z5 logs z6_Z5 unchanged 1.5147947
z14 Z12 limit z14_Z12 unchanged 1.4742547
z5 Z4 logs z5_Z4 unchanged 1.4257166
z24 Z21 IVT z24_Z21 unchanged 1.4143219
z16 Z14 diff.quotient z16_Z14 unchanged 1.4056184
z7 Z6 graph translation z7_Z6 unchanged 1.3700318
Z30 normal vector updated version of A34 z0_Z30 added 1.3176397
z3 Z2 arithmetic rules z3_Z2 unchanged 1.2944966
z32 Z28 FtoC algebra z32_Z28 unchanged 1.2929143
z19 quotient rule z19_Z0 removed 1.2709396
z1 Z1 linear function z1_Z1 unchanged 1.2607121
z30 FtoC graph adjusted to B26 z30_Z0 removed 1.2590765
z31 Z27 int(abs) z31_Z27 unchanged 1.2511116
z34 normal vector adjusted to B30 z34_Z0 removed 1.2255439
z17 Z15 graph f' z17_Z15 unchanged 1.2156068
z8 Z7 sine pi/3 z8_Z7 unchanged 1.1759131
z4 Z3 Easy ineq. z4_Z3 unchanged 1.1498403
Z31 vector product z0_Z31 added 1.0840039
z18 Z16 product rule z18_Z16 unchanged 1.0825203
Z26 FtoC graph updated version of A30 z0_Z26 added 1.0127943
z35 intersect planes z35_Z0 removed 0.9911898
z36 parallel planes z36_Z0 removed 0.8938061
z12 rational exponents adjusted to B11 z12_Z0 removed 0.8758781
z29 Z25 int even funct z29_Z25 unchanged 0.8301225
z15 Z13 series z15_Z13 unchanged 0.8287906
Z32 scalar product z0_Z32 added 0.8117541
Z11 rational exponents updated version of A12 z0_Z11 added 0.7690007
z13 ellipsoid z13_Z0 removed 0.7074127
z2 3D z2_Z0 removed 0.5473019
z25 velocity z25_Z0 removed 0.5210397

Restricting instead to the range \(-2\leq\theta\leq2\):

tidy_info <- test_versions %>%
  mutate(item_num = row_number()) %>% 
  mutate(TotalInfo = purrr::map_dbl(
    item_num,
    ~ areainfo(fit_2pl,
               c(-2, 2),
               which.items = .x) %>% pull(Info)
  ))

tidy_info %>%
  select(-item_num) %>% 
  arrange(-TotalInfo) %>% 
  #group_by(outcome) %>% 
  gt() %>% 
  sub_missing(columns = one_of("notes", "pre", "post"), missing_text = "") %>% 
  fmt_number(columns = contains("a_"), decimals = 2) %>%
  fmt_number(columns = contains("b_"), decimals = 2) %>%
  data_color(
    columns = contains("info"),
    colors = scales::col_numeric(palette = c("Greens"), domain = NULL)
  ) %>%
  data_color(
    columns = contains("outcome"),
    colors = scales::col_factor(palette = c("viridis"), domain = NULL)
  ) %>%
  cols_label(
    TotalInfo = "Information"
  )
pre post description notes label outcome Information
z26 Z22 int(poly) z26_Z22 unchanged 2.1496515
z20 Z17 (exp)' z20_Z17 unchanged 2.0368241
z21 Z18 (ln(sin))' z21_Z18 unchanged 2.0351851
z22 Z19 hyp.functions z22_Z19 unchanged 1.9204031
z9 Z8 trig.ineq. z9_Z8 unchanged 1.9053008
z23 Z20 slope tangent z23_Z20 unchanged 1.8528009
z27 Z23 int(exp) z27_Z23 unchanged 1.7945879
z28 Z24 Int = 0 z28_Z24 unchanged 1.7559052
z10 Z9 trig.identity z10_Z9 unchanged 1.6363424
z11 Z10 period z11_Z10 unchanged 1.4708151
z14 Z12 limit z14_Z12 unchanged 1.3203486
z5 Z4 logs z5_Z4 unchanged 1.2696657
z24 Z21 IVT z24_Z21 unchanged 1.2437250
z6 Z5 logs z6_Z5 unchanged 1.2360528
z33 Z29 difference vectors z33_Z29 unchanged 1.1534582
z7 Z6 graph translation z7_Z6 unchanged 1.1156355
z32 Z28 FtoC algebra z32_Z28 unchanged 1.0872317
z3 Z2 arithmetic rules z3_Z2 unchanged 1.0630819
z19 quotient rule z19_Z0 removed 1.0346021
z30 FtoC graph adjusted to B26 z30_Z0 removed 1.0059374
z17 Z15 graph f' z17_Z15 unchanged 1.0022925
z16 Z14 diff.quotient z16_Z14 unchanged 0.9940931
z8 Z7 sine pi/3 z8_Z7 unchanged 0.9682115
z31 Z27 int(abs) z31_Z27 unchanged 0.9193184
Z30 normal vector updated version of A34 z0_Z30 added 0.9180818
z4 Z3 Easy ineq. z4_Z3 unchanged 0.9070485
Z31 vector product z0_Z31 added 0.8594470
z34 normal vector adjusted to B30 z34_Z0 removed 0.8404270
z18 Z16 product rule z18_Z16 unchanged 0.7834848
Z26 FtoC graph updated version of A30 z0_Z26 added 0.7456596
z36 parallel planes z36_Z0 removed 0.6208678
z35 intersect planes z35_Z0 removed 0.6115641
z29 Z25 int even funct z29_Z25 unchanged 0.5653949
z15 Z13 series z15_Z13 unchanged 0.5615278
z12 rational exponents adjusted to B11 z12_Z0 removed 0.5473130
Z32 scalar product z0_Z32 added 0.5278274
Z11 rational exponents updated version of A12 z0_Z11 added 0.4880861
z13 ellipsoid z13_Z0 removed 0.4075948
z1 Z1 linear function z1_Z1 unchanged 0.2991127
z25 velocity z25_Z0 removed 0.2378399
z2 3D z2_Z0 removed 0.2369482

3.5.2 Evaluating the changes

info_comparison_data <- item_info_data %>% 
  mutate(item = as.character(item)) %>% 
  left_join(test_versions %>% mutate(item = as.character(label)), by = "item") %>% 
  group_by(theta) %>% 
  summarise(
    items_pre = sum(ifelse(outcome %in% c("unchanged", "removed"), info_y, 0)),
    items_post = sum(ifelse(outcome %in% c("unchanged", "added"), info_y, 0)),
    items_added = sum(ifelse(outcome %in% c("added"), info_y, 0)),
    items_removed = sum(ifelse(outcome %in% c("removed"), info_y, 0)),
    n_added = sum(ifelse(outcome %in% c("added"), 1, 0)),
    items_added_mean = sum(ifelse(outcome %in% c("added"), info_y, 0)) / n_added,
    items_removed_mean = sum(ifelse(outcome %in% c("removed"), info_y, 0)) / sum(ifelse(outcome %in% c("removed"), 1, 0))
  ) %>% 
  pivot_longer(cols = starts_with("items_"), names_to = "items", names_prefix = "items_", values_to = "info_y")
test_info_total_plot <- info_comparison_data %>% 
  filter(items %in% c("pre", "post")) %>% 
  mutate(items = case_when(items == "pre" ~ "Version 1", items == "post" ~ "Version 2")) %>% 
  #mutate(items = fct_relevel(items, "pre", "post")) %>% 
  ggplot(aes(x = theta, y = info_y, colour = items, linetype = items)) +
  geom_line() +
  scale_colour_brewer("ETH s21t", palette = "Set1") +
  scale_linetype_manual("ETH s21t", values = c("dashed", "solid")) +
  labs(x = "Ability", y = "Information") +
  theme_minimal() +
  theme(legend.position = "top",
        legend.title = element_text(face = "bold"))

#test_info_total_plot
ggsave("output/eth_pre-vs-post_info.pdf", units = "cm", width = 8, height = 8)
test_info_changes_plot <- info_comparison_data %>% 
  filter(!items %in% c("pre", "post")) %>% 
  mutate(panel = ifelse(str_detect(items, "_mean"), "Mean information\nper question", "Total information")) %>% 
  mutate(panel = fct_rev(panel)) %>% 
  mutate(items = str_remove(items, "_mean")) %>% 
  #mutate(items = fct_relevel(items, "pre", "post")) %>% 
  ggplot(aes(x = theta, y = info_y, colour = items, linetype = items)) +
  geom_line() +
  scale_colour_brewer("Questions", palette = "Paired", direction = -1) +
  scale_linetype_manual("Questions", values = c("solid", "dashed")) +
  facet_wrap(vars(panel), scales = "free_y") +
  labs(x = "Ability", y = "Information") +
  theme_minimal() +
  theme(legend.position = "top")

#test_info_changes_plot
ggsave("output/eth_pre-vs-post_changes.pdf", units = "cm", width = 16, height = 8)
test_info_total_plot + test_info_changes_plot + patchwork::plot_layout(widths = c(2, 3))

ggsave("output/eth_pre-vs-post_info-summary.pdf", units = "cm", width = 16, height = 8)

This shows that the questions that added questions offer, on average, slightly more information than the questions they replaced (however there are fewer of them, hence lower total information as shown above). The difference is largest in the ability range 0-2.

3.6 Response curves

trace_data <- probtrace(fit_2pl, theta) %>% 
  as_tibble() %>% 
  bind_cols(theta = theta) %>% 
  pivot_longer(cols = -theta, names_to = "level", values_to = "y") %>% 
  separate(level, into = c("item", NA, "score"), sep = "\\.")

trace_data %>% 
  mutate(item = fct_reorder(item, parse_number(item))) %>% 
  ggplot(aes(x = theta, y = y, colour = score)) +
  geom_line() +
  scale_colour_viridis_d(option = "plasma", end = 0.8, direction = -1) +
  facet_wrap(vars(item)) +
  labs(y = "Probability of response") +
  theme_minimal()

This plot combines all the trace lines onto a single set of axes:

plt <- trace_data %>% 
  filter(score == 1) %>% 
  left_join(test_versions %>% mutate(item = as.factor(label)), by = "item") %>% 
  mutate(item_removed = (outcome == "removed")) %>% 
  ggplot(aes(x = theta, y = y, colour = item, text = item)) +
  geom_line(aes(linetype = outcome)) +
  scale_colour_viridis_d(option = "plasma", end = 0.8, direction = -1) +
  scale_linetype_manual("outcome", values = c("unchanged" = "solid", "removed" = "dashed", "added" = "twodash")) +
  labs(y = "Expected score") +
  theme_minimal()

ggplotly(plt, tooltip = "text")
ggsave(plot = plt, file = "output/eth_pre-vs-post_iccs-superimposed.pdf", width = 20, height = 14, units = "cm")

That is quite a busy plot, so here we focus only on the changes:

plt <- trace_data %>% 
  filter(score == 1) %>% 
  left_join(test_versions %>% mutate(item = as.factor(label)), by = "item") %>% 
  filter(outcome != "unchanged") %>% 
  mutate(item_removed = (outcome == "removed")) %>% 
  ggplot(aes(x = theta, y = y, colour = item, text = item)) +
  geom_line(aes(linetype = outcome)) +
  scale_colour_viridis_d(option = "plasma", end = 0.8, direction = -1) +
  scale_linetype_manual("outcome", values = c("removed" = "dashed", "added" = "solid")) +
  labs(y = "Expected score") +
  theme_minimal()

ggplotly(plt, tooltip = "text")
ggsave(plot = plt, file = "output/eth_pre-vs-post_iccs-superimposed-highlight.pdf", width = 20, height = 14, units = "cm")

4 Factor analysis for the new test only

4.1 Factor analysis setup

Here we redo the factor analysis, but using only the data from the new version of the test.

item_scores_B <- test_scores %>% 
  select(-F1) %>% 
  select(-contains("Z0")) %>% 
  filter(test_version == "post") %>% 
  select(-class, -year, -test_version)

The parameters package provides functions that run various checks to see if the data is suitable for factor analysis, and if so, how many factors should be retained.

structure <- check_factorstructure(item_scores_B)
n <- n_factors(item_scores_B)

4.1.0.1 Is the data suitable for Factor Analysis?

  • KMO: The Kaiser, Meyer, Olkin (KMO) measure of sampling adequacy suggests that data seems appropriate for factor analysis (KMO = 0.97).
  • Sphericity: Bartlett’s test of sphericity suggests that there is sufficient significant correlation in the data for factor analysis (Chisq(351) = 39041.85, p < .001).

4.1.0.2 Method Agreement Procedure:

The choice of 3 dimensions is supported by 6 (31.58%) methods out of 19 (Bentler, CNG, Optimal coordinates, Parallel analysis, Kaiser criterion, Scree (SE)).

plot(n)

summary(n) %>% gt()
n_Factors n_Methods
1 4
2 1
3 6
4 1
5 1
6 1
10 2
24 2
25 1
#n %>% tibble() %>% gt()

The scree plot shows the eignvalues associated with each factor:

fa.parallel(item_scores_B, fa = "fa")

## Parallel analysis suggests that the number of factors =  6  and the number of components =  NA

Based on this, there is clear support for a 1-factor solution. We also consider the 4-factor solution.

4.2 1 Factor

Show factanal output
fitfact <- factanal(item_scores_B, factors = 1, rotation = "varimax")
print(fitfact, digits = 2, cutoff = 0.3, sort = TRUE)
## 
## Call:
## factanal(x = item_scores_B, factors = 1, rotation = "varimax")
## 
## Uniquenesses:
##   z1_Z1   z3_Z2   z4_Z3   z5_Z4   z6_Z5   z7_Z6   z8_Z7   z9_Z8  z10_Z9 z11_Z10 
##    0.95    0.77    0.80    0.73    0.77    0.76    0.78    0.62    0.66    0.70 
## z14_Z12 z15_Z13 z16_Z14 z17_Z15 z18_Z16 z20_Z17 z21_Z18 z22_Z19 z23_Z20 z24_Z21 
##    0.71    0.88    0.83    0.79    0.86    0.65    0.59    0.61    0.62    0.75 
## z26_Z22 z27_Z23 z28_Z24 z29_Z25 z31_Z27 z32_Z28 z33_Z29 
##    0.70    0.65    0.65    0.88    0.76    0.80    0.87 
## 
## Loadings:
##  [1] 0.52 0.61 0.59 0.55 0.54 0.59 0.64 0.63 0.62 0.50 0.55 0.59 0.59      0.48
## [16] 0.45 0.48 0.49 0.47 0.35 0.42 0.46 0.38 0.34 0.49 0.44 0.36
## 
##                Factor1
## SS loadings       6.88
## Proportion Var    0.25
## 
## Test of the hypothesis that 1 factor is sufficient.
## The chi square statistic is 3038.4 on 324 degrees of freedom.
## The p-value is 0
load <- tidy(fitfact)
ggplot(load, aes(x = fl1, y = 0)) + 
  geom_point() + 
  geom_label_repel(aes(label = paste0("Z", rownames(load))), show.legend = FALSE) +
  labs(x = "Factor 1", y = NULL,
       title = "Standardised Loadings", 
       subtitle = "Based upon correlation matrix") +
  theme_minimal()

load %>% 
  select(question = variable, factor_loading = fl1) %>% 
  left_join(test_versions %>% select(question = label, description), by = "question") %>% 
  arrange(-factor_loading) %>% 
  gt() %>%
  data_color(
    columns = contains("factor"),
    colors = scales::col_numeric(palette = c("Greens"), domain = NULL)
  )
question factor_loading description
z21_Z18 0.6413171 (ln(sin))'
z22_Z19 0.6269824 hyp.functions
z23_Z20 0.6185782 slope tangent
z9_Z8 0.6130896 trig.ineq.
z28_Z24 0.5947277 Int = 0
z27_Z23 0.5943194 int(exp)
z20_Z17 0.5917950 (exp)'
z10_Z9 0.5868835 trig.identity
z11_Z10 0.5497253 period
z26_Z22 0.5456825 int(poly)
z14_Z12 0.5411922 limit
z5_Z4 0.5236613 logs
z24_Z21 0.5047935 IVT
z7_Z6 0.4898908 graph translation
z31_Z27 0.4851195 int(abs)
z6_Z5 0.4770059 logs
z3_Z2 0.4753834 arithmetic rules
z8_Z7 0.4715737 sine pi/3
z17_Z15 0.4605594 graph f'
z4_Z3 0.4480780 Easy ineq.
z32_Z28 0.4439718 FtoC algebra
z16_Z14 0.4156706 diff.quotient
z18_Z16 0.3787153 product rule
z33_Z29 0.3636680 difference vectors
z15_Z13 0.3456321 series
z29_Z25 0.3434776 int even funct
z1_Z1 0.2231657 linear function

The questions that load most strongly on this factor are very standard calculations in integration, differentiation, and trigonometry – which is consistent with the aim of the test.

4.3 4 Factor

Here we also investigate the 4-factor solution, to see whether these factors are interpretable.

Show factanal output
fitfact4 <- factanal(item_scores_B, factors = 4, rotation = "varimax")
print(fitfact4, digits = 2, cutoff = 0.3, sort = TRUE)
## 
## Call:
## factanal(x = item_scores_B, factors = 4, rotation = "varimax")
## 
## Uniquenesses:
##   z1_Z1   z3_Z2   z4_Z3   z5_Z4   z6_Z5   z7_Z6   z8_Z7   z9_Z8  z10_Z9 z11_Z10 
##    0.89    0.73    0.74    0.69    0.74    0.72    0.76    0.55    0.64    0.69 
## z14_Z12 z15_Z13 z16_Z14 z17_Z15 z18_Z16 z20_Z17 z21_Z18 z22_Z19 z23_Z20 z24_Z21 
##    0.67    0.85    0.74    0.72    0.85    0.53    0.53    0.57    0.60    0.64 
## z26_Z22 z27_Z23 z28_Z24 z29_Z25 z31_Z27 z32_Z28 z33_Z29 
##    0.58    0.61    0.61    0.84    0.73    0.77    0.80 
## 
## Loadings:
##         Factor1 Factor2 Factor3 Factor4
## z20_Z17 0.58                           
## z21_Z18 0.53    0.36                   
## z26_Z22 0.52            0.36           
## z1_Z1                   0.31           
## z3_Z2           0.34    0.35           
## z4_Z3                   0.35           
## z5_Z4           0.41                   
## z6_Z5                   0.34           
## z7_Z6                   0.38           
## z8_Z7           0.34                   
## z9_Z8           0.35            0.49   
## z10_Z9          0.40                   
## z11_Z10                         0.32   
## z14_Z12         0.41                   
## z15_Z13                                
## z16_Z14         0.46                   
## z17_Z15                 0.37    0.32   
## z18_Z16                                
## z22_Z19 0.48    0.32                   
## z23_Z20 0.36    0.30            0.38   
## z24_Z21                 0.37    0.42   
## z27_Z23 0.46    0.31                   
## z28_Z24 0.32                    0.43   
## z29_Z25 0.36                           
## z31_Z27                         0.36   
## z32_Z28         0.31                   
## z33_Z29                 0.37           
## 
##                Factor1 Factor2 Factor3 Factor4
## SS loadings       2.42    2.19    1.84    1.73
## Proportion Var    0.09    0.08    0.07    0.06
## Cumulative Var    0.09    0.17    0.24    0.30
## 
## Test of the hypothesis that 4 factors are sufficient.
## The chi square statistic is 812.03 on 249 degrees of freedom.
## The p-value is 7.1e-61
load4 <- tidy(fitfact4)
ggplot(load4, aes(x = fl1, y = fl2)) + 
  geom_point() + 
  geom_label_repel(aes(label = paste0("Z", rownames(load))), show.legend = FALSE) +
  labs(x = "Factor 1", y = "Factor 2",
       title = "Standardised Loadings", 
       subtitle = "Based upon correlation matrix") +
  theme_minimal()

main_factors <- load4 %>% 
#  mutate(factorNone = 0.4) %>%  # add this to set the main factor to "None" where all loadings are below 0.4
  pivot_longer(names_to = "factor",
               cols = contains("fl")) %>% 
  mutate(value_abs = abs(value)) %>% 
  group_by(variable) %>% 
  top_n(1, value_abs) %>% 
  ungroup() %>% 
  transmute(main_factor = factor, variable)

load4 %>% 
  select(-uniqueness) %>% 
  # add the info about which is the main factor
  left_join(main_factors, by = "variable") %>%
  left_join(test_versions %>% select(variable = label, description), by = "variable") %>% 
  arrange(main_factor) %>% 
  select(main_factor, everything()) %>% 
  # arrange adjectives by descending loading on main factor
  rowwise() %>% 
  mutate(max_loading = max(abs(c_across(starts_with("fl"))))) %>% 
  group_by(main_factor) %>% 
  arrange(-max_loading, .by_group = TRUE) %>% 
  select(-max_loading) %>% 
  # sort out the presentation
  rename("Main Factor" = main_factor, # the _ throws a latex error
         "Question" = variable) %>%
  mutate_at(
    vars(starts_with("fl")),
    ~ cell_spec(round(., digits = 3), bold = if_else(abs(.) > 0.4, T, F))
  ) %>% 
  kable(booktabs = T, escape = F, longtable = T) %>% 
  kableExtra::collapse_rows(columns = 1, valign = "top") %>%
  kableExtra::kable_styling(latex_options = c("repeat_header"))
Main Factor Question fl1 fl2 fl3 fl4 description
fl1 z20_Z17 0.582 0.18 0.285 0.116 (exp)’
fl1 z21_Z18 0.526 0.362 0.185 0.173 (ln(sin))’
fl1 z26_Z22 0.521 0.102 0.358 0.106 int(poly)
fl1 z22_Z19 0.484 0.321 0.154 0.265 hyp.functions
fl1 z27_Z23 0.464 0.312 0.145 0.235 int(exp)
fl1 z29_Z25 0.363 0.116 0.083 0.094 int even funct
fl2 z16_Z14 0.185 0.457 0.025 0.139 diff.quotient
fl2 z14_Z12 0.29 0.412 0.236 0.126 limit
fl2 z5_Z4 0.218 0.408 0.259 0.155 logs
fl2 z10_Z9 0.275 0.401 0.232 0.255 trig.identity
fl2 z8_Z7 0.186 0.344 0.201 0.207 sine pi/3
fl2 z32_Z28 0.262 0.312 0.049 0.244 FtoC algebra
fl2 z18_Z16 0.225 0.264 0.089 0.161 product rule
fl2 z15_Z13 0.052 0.237 0.233 0.19 series
fl3 z7_Z6 0.208 0.165 0.379 0.255 graph translation
fl3 z33_Z29 0.115 0.086 0.372 0.192 difference vectors
fl3 z17_Z15 0.147 0.132 0.367 0.318 graph f’
fl3 z4_Z3 0.07 0.287 0.351 0.222 Easy ineq.
fl3 z3_Z2 0.173 0.336 0.348 0.103 arithmetic rules
fl3 z6_Z5 0.248 0.268 0.345 0.093 logs
fl3 z1_Z1 0.104 0.028 0.307 0.028 linear function
fl4 z9_Z8 0.199 0.346 0.224 0.488 trig.ineq.
fl4 z28_Z24 0.325 0.248 0.194 0.431 Int = 0
fl4 z24_Z21 0.15 0.122 0.374 0.423 IVT
fl4 z23_Z20 0.357 0.305 0.185 0.383 slope tangent
fl4 z31_Z27 0.142 0.282 0.207 0.361 int(abs)
fl4 z11_Z10 0.233 0.266 0.295 0.32 period

The first factor is again calculations, but this time only in calculus (i.e. integrals and derivatives).

The second factor seems to be something like “abstract stuff”, it has to do with limits, rules for logarithms etc. I guess that could be a category of its own.

The third one is interesting. It’s clearly graphical interpretations. All in different settings, but clearly belonging together.

And of the fourth factor does not have a clear interpretation.

About this report

This report supports the analysis in the following paper:

[citation needed]

Packages

In this analysis we used the following packages. You can learn more about each one by clicking on the links below.

  • mirt: For IRT analysis
  • psych: For factor analysis
  • tidyverse: For data wrangling and visualisation
  • reshape: For reshaping nested lists
  • vroom: For reading in many files at once
  • broom: For tidying model output
  • fs: For file system operations
  • gt: For formatting tables
  • knitr: For markdown tables
  • ggrepel: For labelling points without overlap
  • skimr: For data frame level summary
  • ggridges: For ridge plots